{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {},
  "cells": [
    {
      "metadata": {},
      "source": [
        "<td>\n",
        "   <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=256/></a>\n",
        "</td>"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "<td>\n",
        "<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/data_row_metadata.ipynb\" target=\"_blank\"><img\n",
        "src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
        "</td>\n",
        "\n",
        "<td>\n",
        "<a href=\"https://github.com/Labelbox/labelbox-python/tree/master/examples/basics/data_row_metadata.ipynb\" target=\"_blank\"><img\n",
        "src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
        "</td>"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Data Row Metadata\n",
        "\n",
        "Metadata is useful to better understand data on the platform to help with labeling review, model diagnostics, and data selection. This **should not be confused with attachments**. Attachments provide additional context for labelers but is not searchable within Catalog."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "## Metadata ontology\n",
        "\n",
        "We use a similar system for managing metadata as we do feature schemas. Metadata schemas are strongly typed to ensure we can provide the best experience in the App. Each metadata field can be uniquely accessed by id. Names are unique within the kind of metadata, reserved or custom. A DataRow can have a maximum of 5 metadata fields at a time.\n",
        "\n",
        "### Metadata kinds\n",
        "\n",
        "* **Enum**: A classification with options, only one option can be selected at a time\n",
        "* **DateTime**: A utc ISO datetime \n",
        "* **String**: A string of less than 500 characters\n",
        "\n",
        "### Reserved fields\n",
        "\n",
        "* **tag**: a free text field\n",
        "* **split**: enum of train-valid-test\n",
        "* **captureDateTime**: ISO 8601 datetime field. All times must be in UTC\n",
        "\n",
        "### Custom fields\n",
        "\n",
        "* **Embedding**: 128 float 32 vector used for similarity. To upload custom embeddings use the following [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb)\n",
        "* Any metadata kind can be customized"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "## Setup"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "!pip install -q \"labelbox[data]\""
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "import labelbox as lb\n",
        "from datetime import datetime\n",
        "from pprint import pprint\n",
        "from labelbox.schema.data_row_metadata import DataRowMetadataKind\n",
        "from uuid import uuid4"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "# Add your api key\n",
        "API_KEY = \"\"\n",
        "client = lb.Client(api_key=API_KEY)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Get the current metadata ontology "
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "mdo = client.get_data_row_metadata_ontology()"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "# list all your metadata ontology as a dictionary accessable by id\n",
        "metadata_ontologies = mdo.fields_by_id\n",
        "pprint(metadata_ontologies, indent=2)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Access metadata by name"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "split_field = mdo.reserved_by_name[\"split\"]\n",
        "split_field"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "tag_field = mdo.reserved_by_name[\"tag\"]\n",
        "tag_field"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "train_field = mdo.reserved_by_name[\"split\"][\"train\"]\n",
        "train_field"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Export metadata fields"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "data_row_ids = [\"<data_row_id>\"]\n",
        "global_keys = [\"<global_key>\"]\n",
        "\n",
        "# These methods provide validation that the ID you are passing is a global key or data row id ,\n",
        "# this method also assures that all IDs are unique\n",
        "datarow_identifiers = lb.DataRowIds(data_row_ids)\n",
        "global_key_identifiers = lb.GlobalKeys(global_keys)\n",
        "\n",
        "# Use one of the identifiers\n",
        "mdo.bulk_export(data_row_ids=global_key_identifiers)\n",
        "# mdo.bulk_export(data_row_ids=datarow_identifiers)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Delete metadata fields from a data row"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "global_key = \"<global_key>\"\n",
        "schema_ids_to_delete =[\"<metadata_schema_id>\"]\n",
        "data_row_id = \"<data_row_id>\"\n",
        "\n",
        "deletions = [\n",
        "    lb.DeleteDataRowMetadata(data_row_id=lb.GlobalKey(global_key), fields=schema_ids_to_delete)\n",
        "    ]\n",
        "\n",
        "# Delete the specified metadata on the data row\n",
        "mdo.bulk_delete(deletes=deletions)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Construct metadata fields for existing metadata schemas\n",
        "\n",
        "To construct a metadata field you must provide the name for the metadata field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the name and value in a dictionary format.\n",
        "\n",
        "\n",
        "\n"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "Option 1: Specify metadata with a list of `DataRowMetadataField` objects. This is the recommended option since it comes with validation for metadata fields."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Construct a metadata field of string kind\n",
        "tag_metadata_field = lb.DataRowMetadataField(\n",
        "    name=\"tag\",\n",
        "    value=\"tag_string\",\n",
        ")\n",
        "\n",
        "# Construct an metadata field of datetime kind\n",
        "capture_datetime_field = lb.DataRowMetadataField(\n",
        "    name=\"captureDateTime\",\n",
        "    value=datetime.utcnow(),\n",
        ")\n",
        "\n",
        "# Construct a metadata field of Enums options\n",
        "split_metadata_field = lb.DataRowMetadataField(\n",
        "    name=\"split\",\n",
        "    value=\"train\",\n",
        ")\n"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "Option 2: You can also specify the metadata fields with dictionary format without declaring the `DataRowMetadataField` objects.\n"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Construct a dictionary of string metadata\n",
        "tag_metadata_field_dict = {\n",
        "    \"name\": \"tag\",\n",
        "    \"value\": \"tag_string\",\n",
        "}\n",
        "\n",
        "# Construct a dictionary of datetime metadata\n",
        "capture_datetime_field_dict = {\n",
        "    \"name\": \"captureDateTime\",\n",
        "    \"value\": datetime.utcnow(),\n",
        "}\n",
        "\n",
        "# Construct a dictionary of Enums options metadata\n",
        "split_metadata_field_dict = {\n",
        "    \"name\": \"split\",\n",
        "    \"value\": \"train\",\n",
        "}\n",
        "\n"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Create a custom metadata schema with their corresponding fields\n"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Final\n",
        "custom_metadata_fields = []\n",
        "\n",
        "# Create the schema for the metadata\n",
        "number_schema = mdo.create_schema(\n",
        "  name=\"numberMetadataCustom\",\n",
        "  kind=DataRowMetadataKind.number\n",
        ")\n",
        "\n",
        "# Add fields to the metadata schema\n",
        "data_row_metadata_fields_number = lb.DataRowMetadataField(\n",
        "    name=number_schema.name,\n",
        "    value=5.0\n",
        ")\n",
        "\n",
        "custom_metadata_fields.append(data_row_metadata_fields_number)\n"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "# Create the schema for an enum metadata\n",
        "custom_metadata_fields = []\n",
        "\n",
        "enum_schema = mdo.create_schema(\n",
        "  name=\"enumMetadata\",\n",
        "  kind=DataRowMetadataKind.enum,\n",
        "  options=[\"option1\", \"option2\"]\n",
        ")\n",
        "\n",
        "# Add fields to the metadata schema\n",
        "data_row_metadata_fields_enum_1 = lb.DataRowMetadataField(\n",
        "  name=enum_schema.name,\n",
        "  value=\"option1\"\n",
        ")\n",
        "custom_metadata_fields.append(data_row_metadata_fields_enum_1)\n",
        "\n",
        "\n",
        "data_row_metadata_fields_enum_2 =  lb.DataRowMetadataField(\n",
        "  name=enum_schema.name,\n",
        "  value=\"option2\"\n",
        ")\n",
        "custom_metadata_fields.append(data_row_metadata_fields_enum_2)\n",
        "\n"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "# Inspect the newly created metadata schemas\n",
        "metadata_ontologies = mdo.fields_by_id\n",
        "pprint(metadata_ontologies, indent=2)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Create data rows with metadata\n",
        "\n",
        "See our [documentation](https://docs.labelbox.com/docs/limits) for information on limits for uploading data rows in a single API operation."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# A simple example of uploading data rows with metadata\n",
        "dataset = client.create_dataset(name=\"Simple Data Rows import with metadata example\")\n",
        "global_key = \"s_basic.jpg\"\n",
        "data_row = {\"row_data\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg\", \"global_key\": global_key}\n",
        "# This line works with dictionaries as well as schemas and fields created with DataRowMetadataField\n",
        "data_row[\"metadata_fields\"] =  custom_metadata_fields + [ split_metadata_field , capture_datetime_field_dict, tag_metadata_field ]\n",
        "\n",
        "\n",
        "task = dataset.create_data_rows([data_row])\n",
        "task.wait_till_done()\n",
        "result_task = task.result\n",
        "print(result_task)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Update data row metadata"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Get the data row that was uploaded in the previous cell\n",
        "num_schema = mdo.get_by_name(\"numberMetadataCustom\")\n",
        "\n",
        "# Update the metadata\n",
        "updated_metadata = lb.DataRowMetadataField(\n",
        "  schema_id=num_schema.uid,\n",
        "  value=10.2\n",
        ")\n",
        "\n",
        "# Create data row payload\n",
        "data_row_payload = lb.DataRowMetadata(\n",
        "  global_key=global_key,\n",
        "  fields=[updated_metadata]\n",
        ")\n",
        "\n",
        "# Upsert the fields with the update metadata for number-metadata\n",
        "mdo.bulk_upsert([data_row_payload])"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Update metadata schema"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# update a name\n",
        "number_schema = mdo.update_schema(name=\"numberMetadataCustom\", new_name=\"numberMetadataCustomNew\")\n",
        "\n",
        "# update an Enum metadata schema option's name, this only applies to Enum metadata schema.\n",
        "enum_schema = mdo.update_enum_option(\n",
        "  name=\"enumMetadata\",\n",
        "  option=\"option1\",\n",
        "  new_option=\"option3\"\n",
        ")"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Accessing metadata\n",
        "\n",
        "You can examine an individual data row, including its metadata."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "data_row = next(dataset.data_rows())\n",
        "for metadata_field in data_row.metadata_fields:\n",
        "  print(metadata_field[\"name\"], \":\", metadata_field[\"value\"])"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "You can bulk export metadata using data row IDs."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "data_rows_metadata = mdo.bulk_export([data_row.uid])\n",
        "len(data_rows_metadata)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Delete custom metadata schema \n",
        "You can delete custom metadata schema by name. If you wish to delete a metadata schema, uncomment the line below and insert the desired name."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "#status = mdo.delete_schema(name=\"<metadata schema name>\")"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Upload/delete/update custom embedding metadata for existing data rows\n",
        "\n",
        "For a complete tutorial on how to update, upload and delete custom embeddings please follow the steps in this [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb).\n",
        "\n"
      ],
      "cell_type": "markdown"
    }
  ]
}